{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9413d98a-352a-47b7-b84b-5b4a61b3c002",
   "metadata": {},
   "source": [
    "# Reddit Post Analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97ebfa77-33f8-4cd1-9204-d73aeefc0fea",
   "metadata": {},
   "source": [
    "1. **Sets the Role and Tone**  \n",
    "   Instructs the AI to act as an **expert analyst** specializing in extracting insights from online forums like Reddit.\n",
    "\n",
    "2. **Guides Sentiment Analysis**  \n",
    "   Asks the AI to evaluate overall sentiment (e.g., positive, neutral, negative), and to present it as approximate percentages with a brief rationale.\n",
    "\n",
    "3. **Groups and Labels Themes**  \n",
    "   Instructs the AI to identify and cluster **key discussion themes**, perspectives, and emotional tones. Each theme should be explained and illustrated with **example comments**.\n",
    "\n",
    "4. **Creates an Insights Table**  \n",
    "   Requests a structured table with fields like *Perspectives, Frustrations, Tools, Suggestions* to concisely summarize the discussion’s core insights.\n",
    "\n",
    "5. **Describes Community Dynamics**  \n",
    "   Asks the AI to assess the **interaction style** (e.g., supportive, sarcastic, argumentative) and note any social patterns (e.g., consensus or conflict)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "425868ba-faec-4754-87f5-650f7529b319",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "#### Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9596f40f-5add-4602-91e3-cd7d2c753c33",
   "metadata": {},
   "outputs": [],
   "source": [
    "import praw\n",
    "import os\n",
    "import requests\n",
    "from dotenv import load_dotenv\n",
    "from IPython.display import Markdown, display, Image\n",
    "from openai import OpenAI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e1a9999-4aad-416d-90fe-3b0841a4f455",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "#### Load Credentials"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "847843ce-ebf9-4f48-b625-82e3ed687c81",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Load environment variables in a file called .env\n",
    "\n",
    "load_dotenv(override=True)\n",
    "api_key = os.getenv('OPENAI_API_KEY')\n",
    "\n",
    "# Check the key\n",
    "\n",
    "if not api_key:\n",
    "    print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
    "elif not api_key.startswith(\"sk-proj-\"):\n",
    "    print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
    "elif api_key.strip() != api_key:\n",
    "    print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
    "else:\n",
    "    print(\"API key found and looks good so far!\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c615d79b-55a0-4eb1-ad8b-a2e28c11b49e",
   "metadata": {},
   "outputs": [],
   "source": [
    "reddit = praw.Reddit(\n",
    "    client_id=os.getenv(\"REDDIT_CLIENT_ID\"),\n",
    "    client_secret=os.getenv(\"REDDIT_CLIENT_SECRET\"),\n",
    "    user_agent=os.getenv(\"REDDIT_USER_AGENT\"),\n",
    "    username=os.getenv(\"REDDIT_USERNAME\"),\n",
    "    password=os.getenv(\"REDDIT_PASSWORD\")\n",
    ")\n",
    "\n",
    "print(\"Authenticated as:\", reddit.user.me())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6df2224d-ecfd-4e07-9bc8-102eff257d69",
   "metadata": {},
   "outputs": [],
   "source": [
    "openai = OpenAI()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21ba0482-79e5-45ec-81d7-8611312c6b9e",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "#### Reddit Post Scraper"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8dc5276d-2d38-4651-9db0-c353076d6096",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "class RedditPostScraper:\n",
    "    def __init__(self, url):\n",
    "        self.submission = reddit.submission(url=url)\n",
    "        self.submission.comments.replace_more(limit=None)\n",
    "        self._title = self.submission.title\n",
    "        self._text = self.submission.selftext\n",
    "        self._comments = \"\"\n",
    "        self._formatted_comments = []  # for reprocessing if needed\n",
    "\n",
    "    def _generate_comments(self):\n",
    "        comments_list = []\n",
    "        for top_level in self.submission.comments:\n",
    "            top_author = top_level.author.name if top_level.author else \"[deleted]\"\n",
    "            comments_list.append(f\"{top_author}: {top_level.body}\")\n",
    "\n",
    "            for reply in top_level.replies:\n",
    "                reply_author = reply.author.name if reply.author else \"[deleted]\"\n",
    "                comments_list.append(\n",
    "                    f\"{reply_author} replied to {top_author}'s comment: {reply.body}\"\n",
    "                )\n",
    "        self._formatted_comments = comments_list\n",
    "\n",
    "    def title(self):\n",
    "        return f\"Title:\\n{self._title}\\n{self._text}\"\n",
    "\n",
    "    def comments(self, max_words=None):\n",
    "        if not self._formatted_comments:\n",
    "            self._generate_comments()\n",
    "\n",
    "        output_comments = []\n",
    "        total_words = 0\n",
    "\n",
    "        for comment in self._formatted_comments:\n",
    "            word_count = len(comment.split())\n",
    "            if max_words and total_words + word_count > max_words:\n",
    "                break\n",
    "            output_comments.append(comment)\n",
    "            total_words += word_count\n",
    "\n",
    "        return \"Text:\\n\" + \"\\n\\n\".join(output_comments)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3121cad0-4e2c-4d78-88e2-e72c6b99e2bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# post = RedditPostScraper(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")\n",
    "# print(post.title())\n",
    "# print(post.comments(2000))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "569760f6-5d68-40c1-9227-374c8e04d70a",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "#### System and User Prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22c0e89a-c076-4616-ae9b-b4cd588f39ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "system_prompt = '''You are an expert analyst specializing in extracting insights from online discussion forums. You will be given the title of a Reddit post and a list of comments (some with replies). Your task is to analyze the sentiment of the discussion and extract structured insights that reflect the collective responses.\n",
    "\n",
    "Your response **must be in well-formatted Markdown**. Use clear section headers (`##`, `###`), bullet points, and tables where appropriate.\n",
    "\n",
    "Perform the following tasks:\n",
    "\n",
    "---\n",
    "\n",
    "## 1. Overall Sentiment Breakdown\n",
    "\n",
    "- Determine the overall sentiment of the responses (e.g., positive, negative, neutral, mixed).\n",
    "- Express the sentiment as approximate percentages (e.g., 60% positive, 25% neutral, 15% negative).\n",
    "- Provide a short explanation for why the sentiment skews this way, referring to tone, topic sensitivity, controversy, humor, or supportiveness.\n",
    "\n",
    "---\n",
    "\n",
    "## 2. Thematic Grouping of Comments\n",
    "\n",
    "- Identify key recurring **themes, perspectives, or discussion threads** in the comments.\n",
    "- For each theme, create a subheading.\n",
    "- Under each:\n",
    "  - Briefly describe the focus or tone of that cluster (e.g., personal stories, criticism, questions, jokes).\n",
    "  - Include 1–2 **example comments** using quote formatting (`>`), preferably ones with replies or high engagement.\n",
    "\n",
    "---\n",
    "\n",
    "## 3. Insights Table\n",
    "\n",
    "If applicable, extract and structure insights into the following table. Leave any column empty if it’s not relevant to the post type:\n",
    "\n",
    "| Perspectives/ Motivations     | Pains/ Concerns/ Frustrations    | Tools / References / Resources       | Suggestions / Solutions            |\n",
    "|-------------------------------|----------------------------------|--------------------------------------|------------------------------------|\n",
    "| - ...                         | - ...                            | - ...                                | - ...                              |\n",
    "\n",
    "- Populate this table with concise bullet points.\n",
    "- Adapt categories to match the discussion type (e.g., switch \"Suggestions\" to \"Reactions\" if it's a news thread).\n",
    "\n",
    "---\n",
    "\n",
    "## 4. Tone and Community Dynamics\n",
    "\n",
    "- Comment on the **style and culture** of interaction: humor, sarcasm, empathy, trolling, intellectual debate, etc.\n",
    "- Mention any noticeable social dynamics: agreement/disagreement, echo chambers, respectful debate, or hostility.\n",
    "- Include casual or emotional comments if they illustrate community personality.\n",
    "\n",
    "---\n",
    "\n",
    "**Respond only in well-formatted Markdown.** Structure your output for clarity and insight, suitable for rendering in documentation, reports, or dashboards. Do not summarize every comment — focus on patterns, perspectives, and collective signals.\n",
    "\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf9d15d6-4f9a-45fd-96ed-d7097c7f03d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def user_prompt_for(post):\n",
    "    user_prompt = f\"You are looking at a Reddit discussion titled:\\n\\n{post.title()}\\n\\n\"\n",
    "    user_prompt += \"Below are the responses from various users. Analyze them according to the system prompt provided.\\n\"\n",
    "    user_prompt += \"Make sure your response is structured in Markdown with headers, lists, and tables as instructed.\\n\\n\"\n",
    "    user_prompt += post.comments(4000)\n",
    "    return user_prompt\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f18c581c-ea30-4a43-9223-8c184dedb37e",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "#### Generating Responses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aadf8f41-aca3-41be-b18b-cb49a67ba256",
   "metadata": {},
   "outputs": [],
   "source": [
    "def messages_for(website):\n",
    "    return [\n",
    "        {\"role\": \"system\", \"content\": system_prompt},\n",
    "        {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
    "    ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "feac9c61-f1f8-48f0-9189-bc60ac7fd755",
   "metadata": {},
   "outputs": [],
   "source": [
    "def summarize(url):\n",
    "    website = RedditPostScraper(url)\n",
    "    response = openai.chat.completions.create(\n",
    "        model = \"gpt-4o-mini\",\n",
    "        messages = messages_for(website)\n",
    "    )\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12b1d6dd-2d62-4136-8b8e-0a92134d4261",
   "metadata": {},
   "outputs": [],
   "source": [
    "# summarize(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd48253d-cdca-4c29-b4f2-c470290de63b",
   "metadata": {},
   "outputs": [],
   "source": [
    "def display_summary(url):\n",
    "    summary = summarize(url)\n",
    "    display(Markdown(summary))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e0825a9-a3b0-43a0-b69c-cf0ce81d77d2",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "#### Example Usage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a61a482-ec70-4e29-b99c-0d82298a32b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_summary(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a336777-a06e-4535-b68d-a6470eb1d701",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_summary(\"https://www.reddit.com/r/AskReddit/comments/1lam10k/how_do_you_feel_about_the_no_kings_protest/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6b12074-ffb6-4a6d-bdd2-bbbb78f82781",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_summary(\"https://www.reddit.com/r/canada/comments/1laq8ok/donald_trump_is_a_convicted_felon_could_he_be/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "63b805e5-183f-439b-bfe7-9ee6bbe4a5b4",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
